Objective
Importing a few libraries needed for this analysis.
library(tidyverse)
library(forecast)
library(dygraphs)
library(astsa)
library(urca)
library(ggthemr)
There are two pieces of data I was able to pull from the source.
These are downloaded as csv files, and some preprocessing is done to shape them in a format ready for modeling.
df <- read_csv("../data/IPG2211A2N.csv",col_names = c("date","ip_index"),skip = 1)
recession <- read_csv("../data/recession.csv",col_names = c("start","end"),skip = 1)
recession_dates = recession %>% mutate(range = map2(start,end,~seq(from = .x,to = .y,by = "month"))) %>% select(range) %>% unnest() %>% pull()
head(df)
head(recession)
The electric and gas index is stored as a time series ts object, since quite a bit of the forecasting and plotting functions prefer ts as opposed to data.frame or tibble objects.
egip <- ts(data = df$ip_index,
start = 1939,
frequency = 12,
names = "egip")
head(egip,18)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1939 3.3842 3.4100 3.4875 3.5133 3.5133 3.5650 3.5650 3.6167 3.7200 3.7200
## 1940 3.7717 3.8233 3.8492 3.8492 3.8750 3.9267
## Nov Dec
## 1939 3.7458 3.7458
## 1940
Let’s visualize the data. The graph shows the monthly values of Electric Gas Industrial Production (EGIP). The gray bars show the recessions. Initial observations:
dygraph_plotter <- function(ts, recession, title = NULL){
command <- paste0("dygraph(",ts,",xlab = 'Year', ylab = 'Industrial Production: Electric & Gas', main = '", title, "') %>% ",
glue::glue("dyShading(from = recession$start[{x}],to = recession$end[{x}]) %>% ",x=1:nrow(recession)) %>% paste0(collapse = ""),
"dyRangeSelector()")
eval(parse(text = command))
}
dygraph_plotter("egip", recession)
The seasonplots plot the time series data split by each “season”, in our case, by “Year”. This plot clearly shows what we’ve already noticed - levels go up over the years, seasonality becomes more pronounced.
ggseasonplot(egip)
Zooming into the data from Y:2000, we can see that - on average - EGIP increases every year for all the months.
ggsubseriesplot(window(egip, start=2000))+labs(y="EGIP")
Considering these initial observations, here’s the modeling approach I am proposing:
Note:
I investigated 2(a) for most of the models listed in 3. I found that they quite underperformed when compared against approach 2(b). In particular, they all overestimated the trend component in the forecasts, resulting in consistantly biased forecasts. Approach 2(b) on the other hand, worked much better in estimating the forecasts. Thus, in the rest of the report, I will only be speaking about approach 2(b).
Here, I split the data into training (1984 - 2014) and testing (>2014). I’m keeping 5 years in testing since the task is to forecast out 5 years, and this is a typical practice. Note - this is where I’m subsetting to data 1984+ only.
egip_train = window(egip, start = 1984, end = 2013+11/12)
egip_test = window(egip, start = 2014)
h = length(egip_test) #forecast length
I used the Box-Cox test to determine the value of lamda to perform the transformation. Transforming the series using this value, we can see that the variance is much stabilized than the original series. Var names have prefix “t” to indicate transformed series.
lam = BoxCox.lambda(egip_train)
tegip_train = BoxCox(egip_train, lam)
tegip_test = BoxCox(egip_test, lam)
dygraph_plotter("tegip_train", recession, title = paste0("Box Cox Transformed Training Data. Lamda = ", round(lam,4)))
Running a simplistic seasonal naive forecast first. This can also serve a baseline - no reason to select anything more complex of a seasonal naive works well. We can see that it works reasonably well, though we do miss out on a few peaks and troughs.
snaiveForecast <- snaive(tegip_train,h = h)
snaiveForecast %>% autoplot() + autolayer(tegip_test) +
scale_x_continuous(limits = c(2010,2019),breaks = seq(2010,2019))+
scale_y_continuous(limits=c(3,3.25))
snaiveForecast %>% accuracy(tegip_test)
## ME RMSE MAE MPE MAPE
## Training set 0.008147698 0.01772244 0.01434680 0.26860008 0.4692704
## Test set 0.003073970 0.01632939 0.01215974 0.09431929 0.3861878
## MASE ACF1 Theil's U
## Training set 1.0000000 0.5649622 NA
## Test set 0.8475575 0.3158671 0.3809585
Using the box-jenkins framework might prove promising, given the seasonality present in the dataset. First step is to make the non-stationary data stationary. A quick investigation into how many diffs to consider can be done using ndiffs and nsdiffs.
tegip_train %>%
diff() %>%
dygraph() %>%
dyOptions(drawPoints = TRUE, pointSize = 2) %>%
dyLimit(mean, color = "red") %>%
dyLimit(mean(diff(tegip_train),na.rm=T),color = "red") %>%
dyRangeSelector()
Check for stationarity, this series is stationary since the test-statistic is less than the 1% critical value.
tegip_train %>% diff(d = 1) %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0483
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
There is still some trending left, and as a result the KPSS test fails stationarity.
tegip_train %>%
diff(lag = 12) %>%
dygraph() %>%
dyOptions(drawPoints = TRUE, pointSize = 3) %>%
dyLimit(mean, color = "red") %>%
dyLimit(mean(diff(diff(tegip_train)),na.rm=T),color = "red") %>%
dyRangeSelector()
tegip_train %>% diff(lag = 12) %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 1.0149
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Adding a simple difference to the previous result makes the signal quite stationary. KPSS test proves this too.
tegip_train %>%
diff(lag = 12) %>%
diff() %>%
dygraph() %>%
dyOptions(drawPoints = TRUE, pointSize = 3) %>%
dyLimit(mean, color = "red") %>%
dyLimit(mean(diff(diff(tegip_train)),na.rm=T),color = "red") %>%
dyRangeSelector()
tegip_train %>% diff(lag = 12) %>% diff() %>% ur.kpss() %>% summary() # Stationary
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.0192
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Let’s stick with the the D=12 + d=1. I suspect it will be required based on the seasonality. The ACF/PACF plots should tell us more.
d1D12tegip_train = diff(diff(tegip_train, lag=12))
What do the ACF plots tell us?
d1D12_acf = acf2(d1D12tegip_train, plot = T, max.lag = 12*4)
Note: in these graphs, the LAG are shown in units of 12-months.
For the d1D12 data, it seems like there is a sharp decline in the seasonal component (integer LAG values) in the ACF, while a slower decay in the PACF. Could indicate presence of a MA(2) for the seasonal component (Q). Hard to infer the ‘p’ and ‘q’ components visually. I could guess p=2 and q=2 based on the sub-LAG components.
So perhaps an SARIMA(2,1,2)(0,1,2)[12] on the box-cox transformed time series is a potential model. Perhaps a SARIMA(1,1,1)(0,1,2)[12]. Let’s investigate.
ARIMA MODEL 1
In the SARIMA(1,1,1)(0,1,2)[12], residuals are quite normal (except a few outliers at the edges). ACF plots show a few components outside the limits, as affirmed by the joint-test Ljung-Box test, which has p-values below 0.05 for all lag values. This indicates we’ve failed the null hypothesis test that the residuals are no different from white noise.
arimaFit1 <- sarima(xdata = tegip_train,
p = 0,d = 1,q = 1,
P = 0,D = 1,Q = 2,S = 12)
## initial value -4.224828
## iter 2 value -4.412504
## iter 3 value -4.455195
## iter 4 value -4.466665
## iter 5 value -4.473913
## iter 6 value -4.474783
## iter 7 value -4.474933
## iter 8 value -4.474940
## iter 9 value -4.474941
## iter 10 value -4.474941
## iter 11 value -4.474942
## iter 11 value -4.474942
## iter 11 value -4.474942
## final value -4.474942
## converged
## initial value -4.484203
## iter 2 value -4.486515
## iter 3 value -4.487276
## iter 4 value -4.487507
## iter 5 value -4.487537
## iter 6 value -4.487538
## iter 7 value -4.487538
## iter 7 value -4.487538
## iter 7 value -4.487538
## final value -4.487538
## converged
arimaFit1$fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), include.mean = !no.constant, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ma1 sma1 sma2
## -0.3967 -0.7494 -0.0656
## s.e. 0.0784 0.0632 0.0633
##
## sigma^2 estimated as 0.000122: log likelihood = 1064.8, aic = -2121.61
ARIMA MODEL 2
SARIMA(2,1,2)(0,1,2)[12] is much more promising. We pass the Ljung-Box test in this instance, and the residuals look good too.
arimaFit2 = sarima(xdata = tegip_train,
p = 2,d = 1,q = 2,
P = 0,D = 1,Q = 2,S = 12)
## initial value -4.239317
## iter 2 value -4.451477
## iter 3 value -4.472258
## iter 4 value -4.489662
## iter 5 value -4.494134
## iter 6 value -4.507379
## iter 7 value -4.513832
## iter 8 value -4.515320
## iter 9 value -4.516732
## iter 10 value -4.518059
## iter 11 value -4.518925
## iter 12 value -4.519182
## iter 13 value -4.519207
## iter 14 value -4.519219
## iter 15 value -4.519223
## iter 16 value -4.519226
## iter 17 value -4.519228
## iter 18 value -4.519229
## iter 19 value -4.519230
## iter 20 value -4.519230
## iter 21 value -4.519230
## iter 21 value -4.519230
## iter 21 value -4.519230
## final value -4.519230
## converged
## initial value -4.533789
## iter 2 value -4.540340
## iter 3 value -4.543668
## iter 4 value -4.545936
## iter 5 value -4.546073
## iter 6 value -4.546278
## iter 7 value -4.546301
## iter 8 value -4.546302
## iter 9 value -4.546302
## iter 10 value -4.546303
## iter 11 value -4.546305
## iter 12 value -4.546307
## iter 13 value -4.546309
## iter 14 value -4.546309
## iter 14 value -4.546309
## final value -4.546309
## converged
arimaFit2$fit
##
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D,
## Q), period = S), include.mean = !no.constant, optim.control = list(trace = trc,
## REPORT = 1, reltol = tol))
##
## Coefficients:
## ar1 ar2 ma1 ma2 sma1 sma2
## -0.0552 0.2366 -0.3058 -0.5497 -0.7459 -0.0671
## s.e. 0.2374 0.1463 0.2257 0.2101 0.0635 0.0640
##
## sigma^2 estimated as 0.0001081: log likelihood = 1085.2, aic = -2156.4
ARIMA MODEL 3
As a quick check, let’s see what forecast::auto.arima gives us. We get seasonal AR component (P=2) and a smaller seasonal MA component (Q=1). This is a better model (lower AIC of -2156 compared to -2122) than my model above, and the Ljung-Box test passes as well. Let’s continue with this model.
arimaFit3 <- auto.arima(tegip_train)
arimaFit3
## Series: tegip_train
## ARIMA(2,1,2)(2,1,1)[12]
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sar2 sma1
## 1.5034 -0.5237 -1.8936 0.8981 -0.0034 -0.1699 -0.7619
## s.e. 0.0741 0.0633 0.0515 0.0485 0.0735 0.0658 0.0583
##
## sigma^2 estimated as 0.0001091: log likelihood=1087.22
## AIC=-2158.44 AICc=-2158.02 BIC=-2127.65
arimaFit3 %>% checkresiduals()
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,2)(2,1,1)[12]
## Q* = 20.245, df = 17, p-value = 0.2619
##
## Model df: 7. Total lags used: 24
This model performs quite well, with consistant results on the test dataset as well.
arimaFit3 %>% forecast(h=h) %>% autoplot() + autolayer(tegip_test) +
scale_x_continuous(limits = c(2010,2019),breaks = seq(2010,2019))+
scale_y_continuous(limits=c(3,3.25))
## Scale for 'x' is already present. Adding another scale for 'x', which
## will replace the existing scale.
arimaFit3 %>% forecast(h=h) %>% accuracy(tegip_test)
## ME RMSE MAE MPE MAPE
## Training set -0.0004500184 0.01015022 0.00782325 -0.01461536 0.2552590
## Test set -0.0042431364 0.01433919 0.01106671 -0.13747594 0.3527257
## MASE ACF1 Theil's U
## Training set 0.5452957 -8.050138e-05 NA
## Test set 0.7713708 3.211888e-01 0.3423345
Here, I wanted to check if introducing an regressor like presence/absence of recession windows as an indicator variable, to see if it improves the ARIMA modeling in any capacity. I’ve also added each month as an indicator variable. The recession indicator doesn’t seem significant. (I also tried other indicators - # of years after recession ended etc, none were significant). Although we see a slightly smaller AIC for this model, the residuals do not pass the Ljung-Box test. I’m not continuing with this approach.
recession_end_years = recession %>% mutate(years = lubridate::year(end)) %>% pull(years)
df <- df %>% mutate(rec_indicator = ifelse(date %in% recession_dates,1,0))
months <- as.factor(lubridate::month(df$date))
months <- model.matrix(~months)[,-1]
dfa <- df %>% bind_cols(as_tibble(months))
xreg_mat_train <- as.matrix(dfa[dfa$date>"1983-12-01" & dfa$date<"2014-01-01",-1:-2])
xreg_mat_test <- as.matrix(dfa[dfa$date>"2013-12-01",-1:-2])
arima_xreg_Fit <- auto.arima(tegip_train, stepwise = F,
xreg = xreg_mat_train,
trace = F, )
arima_xreg_Fit
## Series: tegip_train
## Regression with ARIMA(2,1,2)(1,0,0)[12] errors
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 drift rec_indicator
## -0.0181 0.2639 -0.2904 -0.5569 0.1954 7e-04 -0.0007
## s.e. 0.2387 0.1549 0.2256 0.2086 0.0548 1e-04 0.0035
## months2 months3 months4 months5 months6 months7 months8
## -0.0321 -0.0626 -0.1091 -0.1116 -0.0719 -0.0365 -0.0348
## s.e. 0.0025 0.0032 0.0035 0.0036 0.0037 0.0037 0.0037
## months9 months10 months11 months12
## -0.0767 -0.1097 -0.0936 -0.0317
## s.e. 0.0036 0.0035 0.0032 0.0025
##
## sigma^2 estimated as 0.0001083: log likelihood=1138.26
## AIC=-2238.53 AICc=-2236.29 BIC=-2164.74
arima_xreg_Fit %>% checkresiduals()
##
## Ljung-Box test
##
## data: Residuals from Regression with ARIMA(2,1,2)(1,0,0)[12] errors
## Q* = 27.606, df = 6, p-value = 0.0001115
##
## Model df: 18. Total lags used: 24
ETS models use exponentially weighted past observations to produce new forecasts. There are quite a few available models, depending on if the Trend, Seasonal and Errors components are Additive, Additive Damped, Multiplicative etc. We can search for the best possible option (based off of AICc) using ets. Here, it selects an (M,A,M) model, which is - multiplicative trend, additive seasonality, and multiplicative errors.
The really small value of beta indicates that the slope changes little. The damping was important to add in this model, without it, the model was completely overshooting the forecast.
etsFit <- ets(tegip_train, damped = TRUE) # damped = NULL lets the model select if damping is needed
etsFit
## ETS(M,Ad,M)
##
## Call:
## ets(y = tegip_train, damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.5472
## beta = 1e-04
## gamma = 0.1807
## phi = 0.9739
##
## Initial states:
## l = 2.9115
## b = 0.0015
## s = 1.0088 0.9887 0.9837 0.9944 1.0058 1.0055
## 0.9955 0.9857 0.9897 1.004 1.0147 1.0236
##
## sigma: 0.0037
##
## AIC AICc BIC
## -1091.922 -1089.916 -1021.972
etsFit %>% forecast(h=h) %>% autoplot()+ autolayer(tegip_test) +
scale_x_continuous(limits = c(2010,2019),breaks = seq(2010,2019))+
scale_y_continuous(limits=c(3,3.25))
While the training set metrics are only decent (MASE = 0.6), the test set performance is quite poor (MASE of 1.12).
etsFit %>% forecast(h=h) %>% accuracy(tegip_test)
## ME RMSE MAE MPE MAPE
## Training set 0.0008987398 0.01106920 0.008620047 0.02860499 0.2815743
## Test set -0.0121337058 0.01964737 0.016046373 -0.38978027 0.5128202
## MASE ACF1 Theil's U
## Training set 0.6008339 0.1436723 NA
## Test set 1.1184632 0.4408929 0.4740935
STL decomposing time series into seasonal, trend and irregular components using LOESS can be used for forecasting as well. It’s a simple model to execute, with a few tunable parameters like t.window and s.window which control the width of the signals used for trend and seasonal extraction. The t.window indicates how many consequtive points to use for trend extraction; lower numbers allow more flexible trends. s.window indicates how many consqutive years to use while calculating the seasonal components; lower numbers allow seasonality to change quicker.
I’ve played around with the numbers a bit to get to state I liked. The trend is fairly smooth. The seasonality captures some intersting insights. In 1985, there is predominant “W” shape to the seasonality… as we go towards 2015, it’s a more like a “double-U” shape.
stlFit <- tegip_train %>% stl(t.window = 15, s.window = 5)
autoplot(stlFit)
Do the residuals of this STL decomposition still carry information? YES. The ACF shows significant lags which indicates the residuals are not white noise.
checkresiduals(stlFit$time.series[,"remainder"])
Forecasting using this method, the seasonal components are kept to using a naive approach, while the remainder (Seasonally Adjusted) portion is done using a random walk. We can tell that this method is worse than simply using a seasonal naive model.
stlForecast <- stlFit %>% forecast(method = "naive", h = h)
stlForecast %>% autoplot() + autolayer(tegip_test) + scale_x_continuous(limits = c(2010,2019)) + scale_y_continuous(limits=c(3,3.25))
stlForecast %>% accuracy(tegip_test)
## ME RMSE MAE MPE MAPE
## Training set 0.0006738193 0.007321679 0.005832362 0.02191946 0.1906848
## Test set -0.0116026975 0.018674608 0.015457999 -0.37237723 0.4935960
## MASE ACF1 Theil's U
## Training set 0.4065269 -0.1948613 NA
## Test set 1.0774524 0.4039056 0.4503383
Prophet is a forecasting tool implemented by our colleagues at Facebook, which can perform quite sophisticated forecasts by considering a large number of input & tunable parameters. I’ll be honest - I’ve only used it a handful of times at work to test some of it’s outlier resistance. So it’s quite a black box for me. I’m using it here to test how well it does “out of the box”, but I won’t spend any time tuning it.
library(prophet)
proph_df_train <- tibble(ds = seq(as.Date("1984-01-01"),as.Date("2013-12-01"),by = "month"),
y = as.numeric(tegip_train))
m <- prophet(proph_df_train)
## Initial log joint probability = -2.07089
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
future <- make_future_dataframe(m, periods = h, freq = "month")
forecast <- predict(m, future)
dyplot.prophet(m, forecast)
prophet_forecast <- ts(forecast$yhat[forecast$ds>"2013-12-01"], start = start(tegip_test), frequency = frequency(tegip_test))
Q = sum(abs(diff(tegip_test,lag = frequency(tegip_test))))/(length(tegip_test)-frequency(tegip_test))
MASE = mean(abs(tegip_test - prophet_forecast))/Q
RMSE = sqrt(mean((tegip_test - prophet_forecast)^2))
prophet_yhat = ts.intersect(prophet_forecast, tegip_test)
plot(prophet_yhat,plot.type="single",col=1:2,ylab="EGIP", sub="Red: Actual, Black: Forecast",
main = paste0("Test Set RMSE: ",round(RMSE,3), " ", "Test Set MASE: ", round(MASE,3)))
The residuals for this model are decent. I see a few locations where ACF and PACF are violating the limits, and the Ljung-Box test does fail at the 0.05 level.
prophet_residuals = prophet_yhat[,1]-prophet_yhat[,2]
acf_prophet = acf2(prophet_residuals)
Box.test(prophet_residuals,lag = 36, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: prophet_residuals
## X-squared = 56.571, df = 36, p-value = 0.01581
There are a few packages which do perform ensemble time series modeling like forecastHybrid, I haven’t used them much. However, this following “benchmarking” code (obtained from here) is something Hyndman wrote which beautifully takes a few models, runs all combinations of simple averaging across their forecast values, and returns the best possible combination. Often, an average of forecasts outperformans any single model.
Here, I’m combining Seasonal Naive, ETS, ARIMA, STL and Prophet forecasts. In total, this code will check 31 combinations. Each letter here corresponds to a model. Thus, “NE” = Average of “SNaive” & “ETS”.
N = Seasonal Naive E = ETS A = ARIMA S = STL P = Prophet
benchmarks <- function(y, h) {
require(forecast)
# Compute four benchmark methods
fcasts <- rbind(
N = snaive(y, h)$mean,
E = forecast(ets(y), h)$mean,
A = forecast(auto.arima(y), h)$mean,
S = stlf(tegip_train,h=h)$mean,
P = prophet_forecast)
colnames(fcasts) <- seq(h)
method_names <- rownames(fcasts)
# Compute all possible combinations
method_choice <- rep(list(0:1), length(method_names))
names(method_choice) <- method_names
combinations <- expand.grid(method_choice) %>% tail(-1) %>% as.matrix()
# Construct names for all combinations
for (i in seq(NROW(combinations))) {
rownames(combinations)[i] <- paste0(method_names[which(combinations[i, ] > 0)],
collapse = "")
}
# Compute combination weights
combinations <- sweep(combinations, 1, rowSums(combinations), FUN = "/")
# Compute combinations of forecasts
return(combinations %*% fcasts)
}
benchmarks_out <- benchmarks(tegip_train, h)
This plot shows ALL the 31 combinations together - not too much insight here; quite an eye chart.
ts(t(benchmarks_out),start=start(egip_test),frequency = 12) %>% autoplot() + autolayer(tegip_test,color='black',lty=2)
If I pick MAE as the metric of performance to compare these models, here I plot the distribution of MAE for the test set, sorted by the median MAE value. The top three models here are - AP, NA and NAP. Let’s plot them in a time series plot next.
ggthemr("fresh")
benchmarks_df = t(benchmarks_out) %>% as_tibble()
benchmarks_df$y = tegip_test
benchmarks_df = benchmarks_df[,32] %>% bind_cols(benchmarks_df[,-32])
#MAE
mean_MAE = benchmarks_df[,-1] %>% map_df(~abs(.x-benchmarks_df$y)) %>% gather() %>% group_by(key) %>% summarize(m=median(value)) %>% mutate(key = factor(key,levels=key[order(m)]))
benchmarks_df[,-1] %>% map_df(~abs(.x-benchmarks_df$y)) %>% gather() %>% mutate(key = factor(key, levels = levels(mean_MAE$key))) %>% ggplot(aes(key,value))+geom_boxplot() + coord_flip() + labs(x="Model",y="Test Set MAE")
ggthemr_reset()
Visually, these are extremely similar in performance. I’m picking the “AP” model for the final forecasts.
ts(t(benchmarks_out[c("AP","NA","NAP"),]),start=start(egip_test),frequency = 12) %>% autoplot() + autolayer(tegip_test,color='black',lty=2)
Running the models again on the full (train+test) portions of the data, for Prophet and ARIMA models, and calculating their mean values to estimate the 5-year forecast post Jan-2019.
tegip = BoxCox(window(egip,start=1984), lam)
# Prophet
proph_df <- tibble(ds = seq(as.Date("1984-01-01"),as.Date("2019-01-01"),by = "month"),
y = as.numeric(tegip))
m <- prophet(proph_df_train)
## Initial log joint probability = -2.07089
## Optimization terminated normally:
## Convergence detected: relative gradient magnitude is below tolerance
future <- make_future_dataframe(m, periods = 12*5, freq = "month")
forecast <- predict(m, future)
yhat_P <- tail(forecast$yhat,60)
# ARIMA
yhat_A = Arima(tegip,c(2,1,2),c(2,1,1)) %>% forecast(h=60) %>% .$mean
final_yhat = (yhat_P+yhat_A)/2
final_yhat = InvBoxCox(final_yhat, lam)
autoplot(egip)+autolayer(final_yhat)+
scale_x_continuous(limits = c(2010,2024),breaks = seq(2010,2024))+
theme(legend.position = "none")+
labs(y="EGIP")
One of the pitfalls of the ensemble approach is the estimation of the prediction intervals. There could be a way to analytically compute and combine the PI for the two approaches, though I will have to look into this.
Another way to computationally to so is to calculate the intervals using bootstrapping approaches on simulated time series. Hyndman describes this here. Using some bootstrapping techniques, he first creates many representative similar time series, then applies a model like ETS / ARIMA to each series, and finally obtains a heuristic prediction interval.
I haven’t done this method before, but I want to learn more about it. So I’ll note it down as a potential future improvement item to this task. [I tried using it here, but it requires creation of a fitted_model object for simulate.]
For now, I’m going to use just 1 model, namely the ARIMA model, and get prediction intervals. These are shown below. The dashed line shows the 135 cutoff in question. We can see that in Jan, for years 2022, 2023 and 2024, there is a possibility that the point estimate may cross 135. In 2014, for ex, the model says we should expect the mean estimate to be between 3.17 and 3.27 (for a confidence of 95%); this contains the threshold of 3.25. (BoxCox-ed 135).
The prediction intervals created by auto.arima are quite optimistic. What the model isn’t accounting for is the variation in the predictions due to parameter estimates.
Arima(tegip,c(2,1,2),c(2,1,1)) %>% forecast(h=60) %>% autoplot()+
geom_hline(yintercept = BoxCox(135,lam), lty=2) +
scale_x_continuous(limits = c(2017,2024),breaks = seq(2017,2024))+
labs(y="EGIP", caption="The 135 limit is shown after Box-Cox transformation here.")
Visually, we can plot the distribution of mean estimates for Jan-2024. Just for Jan-2024, the area to the right of the red line indicates the probability of obtaining a mean > threshold, in this case, 8.6%.
sigma_h = 0.02440663 #Calculated from the forecast object using y_PI = y_hat_mean + 1.96 * sigma_h
mu = 3.221144
thr = 3.254454
hist(rnorm(1000, mu, sigma_h))
abline(v=thr, col="red")
#1-pnorm(mean = mu, sd = sigma_h, q = thr)